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Abstract 

Positive definite matrices arise in a dazzling variety of applications. They enjoy this ubiquity 
perhaps due to their rich geometric structure. In particular, positive definite matrices form a convex 
cone whose strict interior is also a differentiable Riemannian manifold. Building on the conic and 
manifold views, we advocate the Symmetric Stein Divergence (S-Divergence) as a 'natural' distance- 
like function on positive matrices. We motivate its naturalness in a sequence of results that connect 
it to the Riemannian metric on positive matrices. Going beyond, we show that the S-Divergence 
has many interesting properties of its own: most notably, its square-root turns out to be a metric. 
We discuss some properties of this metric, including Hilbert space embeddability, before concluding 
the paper with a list of open problems. We hope that our paper encourages others to further study 
the S-Divergence and its applications. 

Keywords: Stein's loss; Bregman matrix divergence; Log Determinant; Afiine invariant metric; 
Hilbert space embedding; Jensen-Shannon divergence 



1 Introduction 



Positive definite matrices provide a generalization of positive real numbers to the noncommutative 
world of matrices. Not surprisingly, positive definite matrices abound in a vast variety of applications. 
Summarizing these applications would be an exerci se in fu tility, so we avoid it, and instead refer the 
reader to a delightful theoretical account by Bhatia iBhatial [2007]. 

Positive definite (henceforth, positive) matrices pervade in part due to their rich geometric structure: 
(i) they form a non-polyhedral, closed, self-dual convex cone; and (ii) the strict interior of this cone is 
actually a differentiable Riemmanian manif o ld. The conic view enjoy s great importance in convex op- 
timization Nesterov and Nemirovskii iBovd and Vandenberg M i2004l |. iBen-Tal and Nemirovksiil 

20011, whi l e the r nanifo l d vie w plays diverse r o les, e .g., in matrix analysis and differential geome- 



trv iBhatial |2007| . iLantd |l999t . iBallmann et al] (l985t . Our paper offers a new conceptual though 
informal link between these two geometric views. 

First, we fix basic notation. The letter T-L denotes some Hilbert space, though for the most part it 
stands for C". The inner product between two vectors x and y in H is written as {x, y) :— x*y (where 
X* denotes conjugate transpose). A matrix A E C"^" is called positive if 



{x, Ax) > for aU x 7^ 0, 



(1.1) 



which we also denote by writing A > 0. We may also speak of positive semidefinite matrices, for which 
(x. Ax) > for all x E H; such matrices are denoted A > 0. The operator inequality A > B means 
A — B > 0. We denote the set oi n x n positive matrices by P". The Frobenius norm of a matrix 
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1.1 Background 
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X e C™^" is defined as ||X||f = ^/T^(X*X), and ||X|1 denotes tlie standard operator 2-norni. Let / 
be an analytic function on C; for a diagonalizable matrix A — UAU*, f{A) equals Uf{A)U*. 

Second, we introduce a key function , the natural Riemannian metric for the manifold of positive 



matrices, defined as (e.g., [Bhatial . 120071 Ch. 6]) 



5fl(x,y) ||iog(y-i/2^r-i/2)||p x,y>o, 



(1.2) 



where log(-) denotes the matrix logarithm. Definition (|1.2p suggests that algorithms involving Sfi 
might be computationally demanding — for example, to compute Sr, we essentially need the generalized 
eigenvalues of X and Y. 

Third, we introduce our main player, the Symmetric Stein Divergenc^ (henceforth, S'-Divergence) , 
defined as 

S{X, F) :^ log det ( ^ ^ ) - ^ log det{XY) for X,Y > 0. (1.3) 



Computationally, ()1.3p is less demanding than (|1.2|) . largely because it requires no eigenvalue compu- 
tations; three Cholesky decompositions suffice. 

Given the above definitions, Sr and S might appear to be very different. However, a closer inspection 
reveals several connections between Sn and S, that were hitherto unknown. In the sequel, we show 
a sequence of results which indicate that despite being numerically different, the S'-Divergence has 
qualitative similarities to the Riemmanian metric. But that is not all; the ^'-Divergence also enjoys 
several intriguing properties of its own. Most notably, its square-root actually turns out to be a metric. 

One may ask, however, beyond the aesthetic mathematics associated with the S'-Divergence, why 
should we care about it? One major reason is pragmatic: S is significantly cheaper to compute, 
whic h leads to huge savin gs in applications that depend on a large number of distance computations. 



e.g., ICherian et al.l [201 1| . Additional motivation comes from the distinguished antecedents of the S'- 



Divergence, as revealed below. 



1.1 Background 

Let be a real valued stric tly convex fu n ction on Hermitian matrices. Then, generates the Bregman 
matrix divergence (see e.g., Lewis 1996 [, Bauschke and Borwein |l997 |): 



D^, := 0(X) + 0(r) - (V0(y), X~Y). 



(1.4) 



If we c hoose (/) — — log det{X), the barrier function for the cone of positive matrices Boyd and Vandenberghd 
2004| . then ()1.4|) results in the Bregman divergence 



£{X,Y) := Tr{XY-^) ~\ogdet{XY-'^] 



(1.5) 



which is better known as Stein's loss ISteinI [ 1956| or the LogDet-Divergence iKulis et al.l|2009| . Bregman 
divergences are nonnegative and definite, but almost always asymmetric. Sever al authors have , there - 



fore , considered symmetrized versions, though usually for vectors, not matrices iBaneriee et al 
iNielsen [2009]. Amongst these the Jensen-Shannon symmetrization 

i?7'"(X, Y) D^iX, (X + Y)/2) + D^{Y, (X + Y)/2), 



(1.6) 



is perhaps the most preferable. Applying (jl.6p to Stein's loss (jl.5|) . we obtain the S-Divergence (jl.3p . 
which explains the name "Symmetric Stein Divergence." 

It is worth noting that Stein's loss i tself shows up in various contexts. We mention a few for the inter- 
ested reader: (i) Statistics ISteinI 1956 [; (ii) Information theorv lCover and Thomas! 1991 |. as the differ- 
ential relative entropy between two multivariate Gaussians with same mean; (iii) Optimization, when 



'^It is a divergence because although nonnegative, definite, and symmetric, it is not a metric. 
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1 INTRODUCTION 



deriving the famous Broyden-Fletcher-Goldfarb-Shan no (BFGS) updates iNocedal and Wrightl 11999 



(iv) M achine Learning, as a Bregman matr ix diverge nce Bauschke and BorweinI 1997| . Dhihon and Troppl 



2007 1 , or for Kernel and metric learning .Kulis et al. 2009| . The symmetric d ivergence (11.31) is clo s ely re 



lated to the Jensen- Shannon Divergence betw een two multivariate g aussians 
and to the Bhattacharya distance in statistics iBhattacharvval 1943j . 



Cover and ThomasI |l99l| . 



1.2 Basic Properties 

We prove below a few basic properties of S. Let X{X) denote the vector of eigenvalues of X (in any 
order), and Eig(X) denote the diagonal matrix that has X{X) as its diagonal. 

Proposition 1.1. Let A, B,C > be p x p matrices; the S-Divergence satisfies: 

(i) 5(/,A) = 5(/,Eig(A)) 

(ii) S{A,B) — S{PAQ, PBQ), where P and Q are invertihle; 
(lii) S{A,B) = S{A-\B-^); 

(iv) S{A(E)B,A(E)C) ^ pS{B, C) . 

Proof, (i) Trivial, as det(/ + A) ^ Hi ^ii^ + ^) ^ Uii^ + ^ii^))- (") Follows easily upon noting that 

det{PAQ + PBQ) _ det(P)-dct(A + B)-det(g) 
[det{PAQ)] 1/2 [det{PBQ)] 1/2 ~ det(P) • [det(A)]i/2[det(B)]i/2 . det(g) ' 

(iii) This also follows easily, since 

det(A-i + B-i) _ det(A) • det{A-^ + B-^) ■ dct(B) 
[det(A-i)]i/2[det(B)-i]i/2 - [det(A)]i/2[det(P)]i/2 ' 

(iv) Notice that A® B + A® C = A® {B + C), and del{A ® B) ^ det(A)Pdet(B)P. □ 
The most useful corollary of Prop. [01 is the following invariance result. 

Corollary 1.2. Let A,B> 0; let X be any invertible matrix. Then, 

S{X*AX,X*BX) = S{A,B) 

The next result reaffirms that S'(-, •) is a divergence; moreover, it shows that S enjoys some limited 
convexity (and concavity). 

Proposition 1.3. Let A,B>0. Then, (i) S{A,B) > with equality if and only if A = B; (ii) for 
fixed B > 0, S{A, B) is convex in A for A < {1 + y/2)B, while for ^ > (1 + \/2)B, it is concave. 

Proof. Since 5 is a sum of Bregman divergences, property (i) follows from definition (jl.Sp itself. Alter- 
natively, strict concavity of the determinant yields 

det((^ + B)/2) > [det(A)]i/2[det(S)]i/2, 

which holds with equality if and only ii A = B; this, in turn implies (i). Part (ii) follows upon analyzing 
the second derivative of S{A,B). Specifically, some algebra reveals that we can identify the Hessian 
V^S{A, B) with the matrix 

® A-^) -(A + B)-^ ®(A + B)-^ , (1.7) 

where ® denotes the Kronecker product. Matrix ()1.7|) is positive for A < {1 + ^/2)B and negative for 
A> {1 + ^/2)B, which implies the said convexity and concavity. □ 
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2 CONNECTIONS WITH 5r 



2 Connections with 6r 

In this section we develop connections between the S'-Divergence and the Riemannian metric. We begin 
by stating two known facts. 

Lemma 2.1. Let A> B > 0, then 

A-^ < B-\ 



Proof. Classical fact; see e.g., Horn and Johnson . 19851 Corollary 7.7.4]. □ 



Lemma 2.2. Let A,B > 0, and t € [0, 1]. Then, 

(tA + (1 - t)B)-^ < tA-^ + (1 - t)B~\ 
Proof. Another classical result; see e.g., [Bhatial . Il997 . Exercise V.1.15]. □ 



2.1 Contraction 



Our first connection stems from the following important contraction property of 5ji (see e.g., [Bhatia 



20071 Exercise 6.5.4]): 

5r{A\B^) <t5R{A,B), for A, B > and i e [0, 1]. (2.1) 

An equivalent inequality also holds for S. 

Theorem 2.3. Let A,B>Q. Then, 

S{A\B') <tS{A,B), te[0,l]. (2.2) 

Proof. For t G [0, 1], the map X i-> X* is operator concave. Thus, ^{A* + B*) < {^-^Y , so that using 
monotonicity of the determinant we obtain 

, , det(i(yl* + B*)) det(i(yl + B))* , ^ 

□ 

Remark 2.4. If in (|2.2I) we have t > 1, then we obtain the reverse inequality 

S{A\B^)>tS{A,B). (2.4) 
Inequality (|2.4p follows from (|2.2p upon considering S{A^^\B^^^). 

Our next result (Theorem 12. 7p shows that Sr and S exhibit a similar monotonic property for matrix 
powers. But before we state and prove this result, we recall some helpful notions. 

Let X and y be vectors in R" . Denote by the vector obtained by sorting the elements of a vector 
z in decreasing order. We write, 

h h 

X ^loiog y if Hj^i 4 - Wj=i ^1' for 1 < fc < n; (2.5) 

X ^log y if a; ^u,iog y and W-^^ x^ = 11^=1 ^j- ^^'^^ 

We n ote that (12. 5p is called weak log-majorization, while (|2.6p is called (strict) log- majorization iBhatial . 

a Ch. 2] 



19971 Ch. 3]. Replacing products by sums, we get the usual notion of majorization [Bhatialll997l Ch. 2] 

k I ..-^fc 

j= 

Related to (|2.6p is the following important theorem. 



X ^wy if y^'' x^, < y^'' for 1 < fc < n. (2.7) 
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2. 1 Contraction 



2 CONNECTIONS WITH 5r 



Theorem 2.5. Let f : — )■ IR+ be a continuous function for which f{e^) is a convex monotonically 
increasing function ofr. If x and y G R" are vectors for which x ^log y, then 



<^ (/(yt),...,/(yi)). 

Proof See e.g., jMarshall et"all . l201l[ Theorem ]. 

Theorem 12 . 5 1 helps prove the fohowing monotonicity result for determinants of sums. 
Lemma 2.6. Let A,B > 0, and 1 < t < u < oo. Then, 

detV*r^UdetV«^^'' + ^" 



(2.8) 
□ 



2 J - V 2 

Proof. Let P = A^^, and Q ^ B. To show (12.91) . we may equivalently show that 



l + Aj(F"Q") 



i/ti 



j=i ^ ' i=i 

Recall now the log-majorization (see e.g., [Bhatia . 1997 . Theorem IX. 2. 9]): 

Ai/*(P*g*) ^log Ai/"(F"Q"). 
Applying Theorem 12.51 to (|2.1ip with f(r) = log(l + r") yields the inequalties 

log(l + \f{P'Q')) < ^ log(l + A,(P*Q*)), 1 < fc < 

Using monotonicity of log and the function r i~-> r^/", these inequalities imply that 



(2.9) 



(2.10) 



(2.11) 



1 + A"'^(P*Q*) 



< 



11 J =1 



l + Aj(F"Q") 



2 y - iij=iV 2 

But since u > t, the function r i— >■ r"/* is convex, which implies that 



i/ti 



1 < /c < 



T-,. ,-i + A;/*(P*n*u 1/" 



> 



n' 



l + Aj(P*Q*) 



l/u 



n' 



1 + A,(P*Q*) 



□ 



Theorem 2.7. Lei A, P > 0, and 1 < < < u < oo. Then, 

t-HR{A\B') < w-Mfl(A", P") (2.12) 
t-^S{A\ P*) < ^-^^(A", P"). (2.13) 

Proof [Theorem [2J] (i) Note that (5ii(X,y) can also be written as \\\og{XY-^)\\F. So, to prove (f2T2l) 
we must show that 

i||l0g(yl*P-*)||F < i||l0g(A"p-«)||F, 

or equivalently that 

||logAi/*(^*p-')||2 < ||logAi/"(A"p-")||2, (2.14) 
To prove (PH)) . first let /(r) = | logr|; then apply Theorem [23] to (PTT]) to obtain 

|logAi/*(A*p-*)| |logAi/"(A"p-")|, 

which quickly yields (|2.14p since ||-||2 is a symmetric gauge — see e.g., (Bhatia . 1997 , Example II. 3. 13]. 
(ii) To prove (|2.13l) we must show that 

ilogdet((A* +P*)/2) - |logdet(A*P*) < i logdet((A" + P")/2) - f logdet(A^P"). 

This inequality is immediate from Lemma 12.61 and monotonicity of log. □ 
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2.1 Contraction 2 CONNECTIONS WITH 5r 

2.1.1 Translation and contraction 

Next, we prove an analogue of the following contraction (under translation) result Bougerol , 19931 



Prop. 1.6]: 

Sr(A + X,A + Y)< —^6r(X, r), for A > 0, and X, F > 0, (2.15) 

a + p 

where a = max{||X|| , ||1^||} and (3 = Aniin(A). This result p lays a key role in deriving contractive maps 
for solving certain nonlinear matrix equations iLee and Lim [2008.] . 



We show that a similar result also holds for the ^-Divergence. 
Theorem 2.8. Let X,Y >0, and A>0, then 

g{A) -.^ S{A + X,A + Y), (2.16) 
is a monotonically decreasing convex function of A. 

Proof. We wish to show that ii A < B , then g{A) > g{B). Equivalently, we can show that the gradient 



V(?(A) < [Bovd and Vandenberghd . 12004 , Section 3.6]; but this follows, since 



2 7 2' '2 

which is negative because the map X i— > is operator convex (Lemma 

To prove that g is convex, we look at its Hessian, V^g[A). Using the shorthand P — {A-\-X) 
and Q = {B + X)~ , we see that 



Lemmas O and O imply that ^ < , fr om which it follows that 

y giA)> 

But this is easily seen to be nonnegative, since it simplifies down to 

P(SP + Q(E)Q-P(E)Q + Q®P=iP-Q)(E)iP~Q)>0. □ 
The following corollary is immediate (c/. ()2.15p ). 
Corollary 2.9. Let X,Y > 0, A > 0, l3 = Amin(^)- Then, 

SiA + X,A + Y) < S{/3I + X,/3I + Y)< S{X, Y). (2.17) 

2.1.2 Contraction on a geodesic 

The curve 

7(i) := A^/\A-^/^BA-^/^yA^/\ for t £ [0, 1], (2.18) 

parameterizes the unique geodesic between the positive matrices A and B — see e.g., (Bhatial l2007i 
Theorem 6.1.6]. On this curve the Riemannian metric satisfies the 'natural' result 

SRiA,j{t))^tSR{A,B), te[0,l]. 

We show that the .S-Divergence satisfies a similar, albeit slightly weaker result. 
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2.2 Geometric mean 2 CONNECTIONS WITH 5r 

Theorem 2.10. Let A,B >0, and ^{t) be defined by ((^1^ . Then, 

S{A,-f(t)) <tS{A,B), 0<t<l. (2.19) 
Proof. The proof follows upon observing that 

S{A,j{t)) = (A-^^^BA-^/y) 

US! 

< tS{I,A-^/^BA-^/^) ^tS{A,B). □ 

2.2 Geometric mean 

In this section we turn our attention to an object that perhaps connects Sr and S most intimately: the 
matrix geometric mean (GM), which is given by the midpoint of the geodesic (j2.18p . denoted as 

AjJB 7(1/2) = ^1/2(^-1/2^^-1/2^1/2^1/2^ (2.20) 



The GM (|2.20l) has numerous attractive properties — see for instance Andd 19791 — among these, the 



following variational characterization is of importance iBhatia and Holbrookl [2006|: 



A^B = argmin;^>o S^A, X) + Sl{B, X). (2.21) 

A quick calculation shows that A'^B is equidistant from A and B, i.e., 6ii{A, Ajj^B) — Sii{B, Ajj^B). We 
show that the GM enjoys a similar characterization even with S. 

Theorem 2.11. Let A,B >0. Then, 

A^B = argmin^x, h{X) := S{X, A) + S{X, B). (2.22) 
Moreover, A^B is equidistant from A and B, i.e., S{A,A\\B) = S{B, A\\B). 

Proof, li A = B, then clearly X = A minimizes h{X). Assume therefore, that A ^ B. Ignoring the 
constraint X > for the moment, we see that any stationary point of h{X) must satisfy Vh{X) = 0. 
Thus, 

, , fX + A\~^l {X + B\~^l 1 

X-^ = {X + A)-^ + {X + B)-^ (2.23) 
=^ {X + A)X-^{X + B)^2X + A + B 
=^ B = XA-^X. 

The last equat ion is a Riccati equation that is known to have a unique, positive definite solution, the 



GM A^B (see [Bhatial . l2007l Prop 1.2.13]). 

We must now show that this stationary point is a local minimum, and not a local maximum or 
saddle point. Thus, we show that the Hessian is positive definite at the purported minimum X — A^B. 
The Hessian of h{X) is given by 

2\'^h{X) = X-^ (g) X-^ - [{X + A)-^ (g){X + A)-^ + {X + B)-^ ® {X + B)-^] . 

Writing P ^ {X + A)"\ and Q ^ {X + S)"\ upon using (P^^ we obtain 

2V^h{X) = {P + Q)^{P + Q)-P(E)P-Q(E)Q 

= iP + Q)®P+iP + Q)®Q-P®P-Q(g)Q 
= (Q®P) + (P® Q) > 0. 
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2.3 Bounds relating 5r and S 2 CONNECTIONS WITH Sr 

Thus, X — A'iB is a strict local minimum of p.2p . This local minimum is actually global, because 
\/h{X) = has a unique positive definite solution. 

To prove the equidistance property, recall that A^B = B^A; then observe that 

S{A,A]iB) = S{A,B\\A) = S{A, B^/^{B-^^^AB-^^^y/^B^^^) 
= 5(5-1/2^5-1/2, (5-1/2^5-1/2)1/2) 
= 5((5-i/2A5-i/2)1/2,/) 
= 5(51/2(5-1/2^5-1/2)1/251/2,5) 

= 5(5tJA,5) = 5(5,AtJ5). □ 

The GM is the midpoint on the geodesic between A and 5; an arbitrary point on this geodesic 
(previously written as 7(i)) is often also written as 

AtJt5 := Ai/2(v4-i/25A-i/2)*Ai/2 forte [0,1]. (2.24) 

For geodesies given by (|2.24p . 6r satisfies the following "cancellation" inequality 

SR{A\^tB,A\\tC) <tdR{B,C) forA,5,C>0, and te [0,1]. (2.25) 

For a proof, see Bhatial . [20071 Theorem 6.1.12]; we show that a similar inequality holds for S. 



Theorem 2.12. Let A,B,C > 0, and t e [0, 1]. Then, 

SiAitB,A^tC) <tSiB,C), te[0,l]. (2.26) 

Proof. Prop. 11.11 and Theorem 12.31 help prove this claim as follows: 

S{AitB,AitC) = 5(Ai/2(A-i/25A-i/2)*Ai/2, Ai/2(A-i/2CA-i/2)*Ai/2) 
= 5((A-i/25A-i/2)*, (A-i/2Cv4-i/2)*) 

< i5(A-l/25A-l/2,A-l/2CA-l/2) 

= tS{B,C). □ 
2.3 Bounds relating 5r and S 

So far we saw several properties exhibited by both 6r and S. Now we explore how they directly relate 
to each other. Our main result here is the sandwiching inequality (j2.28p . 

Theorem 2.13. Let A, B > be nxn matrices, and let \{AB^^) he the vector of eigenvalues of AB^^ . 
Also, let 

5t{A,B):^ max{|logA,(.15-i)|} = ||log(A5-i)||, (2.27) 

l<'i<n 

denote the Thompson metric \ThomvsoT\ \l96i l. Then, we have the following bounds: 

8S{A,B) < S%{A,B) < 2(5t(A,5)(5(A,5) +nlog2). (2.28) 

Proof. First we establish the upper bound and then the lower bound. To that end, it is useful to rewrite 
6r as 

6RiA,B) (^^.log2A,(A5-i))'/'. (2.29) 
Since Xi{AB^^) > 0, we may write Xi{AB^^) e"'; and therewith obtain 

6r{A,B) = \\u\\ and 6t{A, B) ^ \\u\\oo. (2.30) 
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3 METRIC PROPERTY 



Using the same notation we also obtain 

S{A,B) - V .(log(l + e"-) - uj2 - log 2). (2.31) 

^ 

To relate the quantities (|2.3Qp and f|2.3ip . it is helpful to consider the function 

fiu) :=log(l + e")-V2-log2. 

Ifu < 0, then log(l+e") > log 1 = holds and-u/2 = \u\/2; while if m > 0, then log(l+e") > loge" = u 
holds. For both cases, we therefore have the inequality 

/H > |u|/2-log2. (2.32) 

Since S{A, B) = J2i inequality (|2.32p leads to the bound 

S{A,B) > -nlog2 + i^ Jw,| = -nlog2. (2.33) 

From Holder's inequality we know that u^u < ||m||oo||u|| i; so we immediately obtain 

SUA, B) < 26t{A, B){S{A, B) + n log 2). 
To obtain the lower bound, consider the function 

g(u, cr) u^u - cr(log(l + e") - u/2 - log 2). (2.34) 
The first and second derivatives of g with respect to u are given by 



g'(u,a) = 2u- — - + -, g"{u,a)=2- 



u\2 ■ 



2' - ^ ' ' (i + e 

Observe that for g'{Q,a) = 0. To ensure that is the minimizer of (I2.34[) . we now determine the 
largest value of a for which g is convex, or equivalently, the g" > 0. Write z :— e"; we wish to 
ensure that (jz/{l + zY < 2. Since z > 0, the AMGM inequality shows that ^jq^ ^ 1+^1+; - T 
Thus, for < (T < 8, the inequality crz/(l + z)^ < 2 holds (or equivalently g"{u,a) > 0). Hence, 
= g(0, cr) < g{u, cr), from which we may conclude that 

Sl{A,B)-<7S{A,B)=y^ g{u„a)>0, for < cr < 8. □ 

3 Metric property 

After briefly exploring connections between S and we advance to studying properties of S that are 
of independent interest. The most important one amongst them is that a/5 is actually a metric! 

Theorem 3.1. The function 63 = defines a metric on positive matrices. 
Theorem 13. li s proof depends on several substeps; we either state or prove these below. 



Definition 3.2 f [Berg et al.l . 11984 Def. 1.1]). Let A" be a nonempty set. A function ip : X x X ^ R 
is said to be negative definite if for all x,y X it is symmetric {ip(x,y) — ip{y,x)), and satisfies the 
inequality 

En 
CiCjtp{Xi,Xj) < 0, 
J = l 

for all integers n > 2, and subsets {a^OiLi ^ {ciliLi ^ "with J27=i ~ ^■ 
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3 METRIC PROPERTY 



Based on Def. 13.21 is the following famous result of Schocnberg. 

Theorem 3.3 (Berg et all . Il984 Prop. 3.2, Chap. 3]). Let ip : X x X be negative definite. Then, 
there is a Hilbert space H Q ffi'* and a mapping x tf{x) from X ^ % such that one has the relation 

- fivWu = \i^{x,x) +ip{y,y)) - ip{x,y). (3.1) 

Moreover, negative definiteness of ip is necessary such an embedding to exist. 

Theorem 13.31 paves the way to the following lemma. 

Lemma 3.4. Let 6'^{x,y) := log[{x + y)/{2^j^xy)] for positive scalars x,y. Then, 

Ss{x,y) < Ss{x,z) + 6siy,z) for all x,y,z>0. (3.2) 

Proof. If we show that ■0(a;,y) — — log(a; + y) is negative definite, then, since Sl{x,y) — ^(■(/'(xjx) + 
i^{y, y)) — ip{x, y), Theorem [331 will immediately imply the tr iangle inequality p. 2p . Thus, we now prove 
that ■0 is negative definite. Equivalently, we may show (see Berg et al. . 1984L Thm. 2.2, Chap. 3]) that 
^-^ip{x,y) _ + y)~l^ is a positive definite function for /? > and a;, y > 0. It suffices to show that the 
matrix 

H =[hij]=[ixi + Xj)-l^], l<i,j<n, 
is positive for any integer n > 1, and points {xi}^^^ C M-|-+. Now, observe that 

where r(/3) = e^*t^^^dt is the well-known Gamma function. Thus, with fi{t) = e^*^'i^^ G 
L2([0,oo)), we see that hij — {fi, fj), which implies that H > 0. □ 



Using Lemma 13.41 we can prove the following simple but important result. 
Lemma 3.5. Let x,y, z E and p > I. Then, 

Proof. Lemma 13.41 implies that for the scalars Xi, yi, and Zi, 

Ss{xi,yi) < Ss{xi,z,) + Ss{yi,Zi), 1 <i <n. 

Now exponentiate this, sum up, and invoke Minkowski's inequality. □ 

Lemma l3.5l in turn, helps prove the following triangle inequality. 
Theorem 3.6. Let X,Y, Z > be diagonal. Then, 

Ss{X,Y)<ds{X,Z) + Ss{Y,Z) (3.5) 
Proof. Simply notice that for diagonal matrices X and Y , 

Six, Y) = SliX, Y) = y2_SUXu,Yu), 

and then invoke Lemma 13.51 with p ~ 2. □ 

Next, we recall an important determinantal inequality for positive matrices. 
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Theorem 3.7. Let A,B>Q. Let \^{X) denote the vector of eigenvalues of X sorted in decreasing 
order; define A^(X) likewise. Then, 

n" ^ (At(A) + Xj{B)) < det(A + B)<Y[\ (XjiA) + XjiB)). (3.6) 

Proof See [Bhatial [l99l Exercise VI.7.2]. □ 

Corollary 3.8. Let A,B > 0. Let F,ig^{X) denote the diagonal matrix with X'^iX) as its diagonal; 
define Eig^(X) likewise. Then, 

S{Kig\A),Kig\B)) < S{A,B) < 5(Eig-^(A), Eig^(i?)) 
5s(Eig^(A),Eig^(B)) < Ss{A,B) < Ss(Eig\A),Eig^iB)). 

Finally, we prove a very useful congruence result. 
Lemma 3.9. Let A > 0, and let B be Hermitian. There is a matrix P for which 

P*AP = I, and P*BP = D, where D is diagonal. (3.7) 

Proof. Although this is a well-known result, we include a brief proof for completeness. Let A = UAU*, 
and define S = A'^/^U. The the matrix S*U*BSU is Hermitian; so let V diagonalize it to D. Now set 
P = USV, whereby 

P*AP = V*S*U*UMJ*USV = V*U*A-^I'^AA-^''^UV = /, 

and by construction, P*BP = V*S*U*BUSV ^ D. □ 

Accoutered with the above results, we are now ready to prove Theorem 13. II 

Proof, f Theorem 13. II) . Symmetry, nonnegativity, and definiteness of Ss are immediate; the only non- 
trivial part is the triangle inequality. Let X,Y, Z > he arbitrary. From Lemma 13.91 we know that 
there is a matrix P such that P*XP = I and P*YP — D. Since Z > is arbitrary, for brevity we 
write just Z instead of P*ZP. Also, since S{P*XP, P*YP) — S{X, Y), proving the triangle inequality 
reduces to showing that 

5s{I,D)<5s{I,Z) + 5s{D,Z). (3.8) 
Consider the diagonal matrices and Eig'''(Z). Theorem 13.61 asserts that 

5s{I. D^) < Ssil, Eig^(Z)) + 6siD\ Eig^(Z)). (3.9) 

Prop.Iimi) implies that SsiI,D) = 6s{I,D^) and SsiI,Z) ^ Ss{I ,Eig^{Z)), while Corollary EH shows 
that 6siD^,Eig^{Z)) < 5s{D,Z). Hence, we finally obtain 

5s[I,D)<6s{I.Z) + 5s{D,Z). □ 
3.1 Hilbert space embeddings 

We proved above that 5s is a metric; and Lemma [3.41 showed that for scalars 5s embeds isometrically 
into a Hilbert space. One may ask if 5s{X,Y) also admits such an embedding. 

Theorem 13.31 implies that such an embedding exists if and only if 5'i,{X,Y) ~ S{X,Y) is negative 
definite; equivalently, if and only if 



e 



det{XYdet{YY 
det((X + r)/2)'9' 
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is a positive definite kernel for every /? > 0. It suffices to check whether the matrix 



[det{X, 



1 <i,j <n, 



(3.10) 



is positive for every integer n > 1 and arbitrary positive matrices Xi, . . . , X„. 

Unfortunately, a quick numerical experiment reveals that Hfj can be indefinite. A counterexample 
is given by the following positive matrices 



0.1406 0.0347 
0.0347 0.1779 



2.0195 0.0066 
0.0066 0.2321 



, ^3 = 



X4 



1.0309 0.8694 
0.8694 1.2310 



and X; 
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1.0924 0.0609 
0.0609 1.2520 

0.2870 -0.4758 
-0.4758 2.3569 



(3.11) 



and the choice f3 = 0.1 (several other (3 also work), for which Amin(ff^j) — —0.0017. This counterexample 
destroys hopes of embedding the metric space (P*^, 5s) isometrically into a Hilbert space. 

Although the matrix p.lO|) is not positive in general, it leads us to wonder: For what choices of j3 is 
Hjj positive? Theorem 13. 101 answers this question, and characterizes the values of (3 that are necessary 
and sufficient for Hp to be positive (for all integers n>l). 

Theorem 3.10. Let Xi, . . . ,X„ > be d x d. The n x n matrix Hp defined by p.lOp is positive, if 
and only if j3 satisfies 



^ e {§ : J e N, and 1 < j < (d- 1)} U {7 : 7 e R, and7 > \{d- 1)} 
Proof. We first prove the "if" part. Recall therefore, the Gaussian integral 

s--"^^da: - 7r'^/2det(X)-i/^ 

Now define fi :— —^746"^ ^ compute the inner-product 



(3.12) 



(/'ij fj 



.d/2 



= det(X, + Xj)-^/^, 



which shows that if 1/2 is positive. From the Schur product theorem we know that the elementwise 
product of two positive matrices is again positive. So, in particular Hp is positive whenever j3 is an 
integer multiple of 1/2. To extend the result to all /3 cover ed bv (13.121), we invoke another integral 
representation, the multivariate Gamma function., defined as MuirheadT 1982 . §2.1.2]: 



rd(/3) 



e~T'^(-^Met(A)'3-('^+i)/2dA where /3 > i(d- 1), 



A>0 



where the integration is the set oi d x d positive matrices. Define now /,; := ce "■"''(^^i 
compute the inner product 



(3.13) 
and then 



Cl 



A>0 



^Tr(A(X,+X,))j^^(^)/3-(<i+l)/2^^ = det(X, +X,)-^ 



which exists whenever (3 > i((i — 1). Thus, Hp is positive for all /3 defined by p.l2p . 

The "only if" part is a deeper result that follows from the theory of symmetric cones. More 
specifically, since the positive matrices form a symmetric cone, and the function l/det(A") is decreas- 
in g on this cone (i.e., l/det (X + Y) < l/det(X) for aU X,Y > 0), an appeal to Theorem VII.3.1 
of Faraut and Koranvi 1994| allows us to conclude our claim. □ 
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4 DISCUSSION AND FUTURE WORK 



We end our discussion by identiiying two subclasses of positive matrices that can be embedded 
isometrically into some Hilbert space (Theorems 13.111 and 13. 12p . 

Theorem 3.11. Let X be a set of positive matrices that commute with each other. Then, {X,Ss) can 
be isometrically embedded into some Hilbert space. 

Proof. Since the matrices commute, they can be simuUaneously diagonahzed, which reduces the problem 
to embedding diagonal matrices. But for diagonal matrices, S{X,Y) — ^ - (X^i, F^i), which is a 
nonnegative sum of negative definite kernels (see Lemma [3. 4p . and is therefore itself negative definite. 
Now invoke Theorem 13.31 □ 

Theorem 3.12. Let X be the set of "perturbed" diagonal matrices, that is 

X = {I + uu^ : ueR'^,\\u\\ =1} . (3.14) 
Then, {X, 5s) embeds isometrically into some Hilbert space. 

Proof. Let X = I + xx^ and Y = I + yy^ lies in X. Define a :— x'^y and consider 
S{X,Y) =logdet((X + r)/2)- ilogdct(Xy) 

= log [^-^) - 5 log 4 - log . 

We must show that e~^^'^^'^^ is positive definite; thus, consider the kernel 

H^, y) 9 _ ^^TyY ll^ll = = 1- (3-15) 

We must show that k^{x,y) is positive definite. We use a result of Bapat Bapat 1988{ . which shows 
that if a matrix [oij] with positive entries is negative definite, then [l/a(^] is positive for all /3 > 0. The 
function h{x,y) := 9 — (x'^y)'^ is easily seen to be negative definite, since e"*''^^^"-' is positive definite 
for all t; formally, 

t^ 



-j>u 3 

which is clearly positive definite. □ 



4 Discussion and Future Work 

In this paper we considered the Symmetric Stein Divergence (S'-Divergence) defined on positive definite 
matrices. We derived numerous results that uncovered qualitative similarities between the S'-Divergence 
and the Riemannian metric on the manifold of positive definite matrices. More interestingly, we also 
showed that the square root of the S'-Divergence actually defines a metric; though, a counterexample 
showed that this metric is not isometrically embeddable into any Hilbert space. 
Several directions of future work are open; we mention some below. 

• Deriving refinements of the main inequalities presented in this paper. 

• Extending Theorem 12 . 1 II to define geometric means for more than two matrices. 

• Studying properties of the metric space (P"*, Ss) 

• Fully characterizing the subclass A" C P*^ of positive matrices for which {X,Ss) admits an isometric 
Hilbert space embedding. 
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• Extending S to define divergence functions for more than two input matrices. For example, one 
fairly natural extension could be 



We note that such a generalization is not possible for the Riemannian metric. 

• Identifying applications where 5* (or 6s) can be useful. 

We hope that our paper encourages other researchers to investigate new properties and applications 
of the 5-Divergence. 
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